This is a summary of the single cell analysis performed after demultiplexing with Freemuxlet.
In general, the results are very consistent with what we had previously submitted. The analysis parameters are very similar to what was done before i.e. we didn’t have to force anything to make the results come out the same. This is good.
The bulk of the analysis was performed using Monocle3 functions with occasional modifications as noted.
If you are interested in browsing the code, here is what is in each file:
These are the number of cells identified from the first level of freemuxlet analysis:
## # A tibble: 91 x 3
## # Groups: doubletCalls [4]
## doubletCalls FNL.BEST.GUESS n
## <chr> <chr> <int>
## 1 Singlet 0,0 9621
## 2 Singlet 1,1 8361
## 3 Singlet 10,10 9615
## 4 Singlet 11,11 1808
## 5 Singlet 2,2 10983
## 6 Singlet 3,3 8497
## 7 Singlet 4,4 13052
## 8 Singlet 5,5 9577
## 9 Singlet 6,6 7720
## 10 Singlet 7,7 8709
## 11 Singlet 8,8 10098
## 12 Singlet 9,9 7322
## 13 Freemuxlet.DBL 1,0 828
## 14 Freemuxlet.DBL 10,0 889
## 15 Freemuxlet.DBL 10,1 1099
## 16 Freemuxlet.DBL 10,2 992
## 17 Freemuxlet.DBL 10,3 875
## 18 Freemuxlet.DBL 10,4 1086
## 19 Freemuxlet.DBL 10,5 1423
## 20 Freemuxlet.DBL 10,6 1211
## # … with 71 more rows
The FNL.BEST.GUESS corresponds to the cluster assignment provided directly by freemuxlet. See “freemuxlet_data_out/fml_metadata_summary1.csv”
After incorporating Ravi’s genotyping information, the singlets were assigned to the patient id’s we are familiar with (see “freemuxlet_data_out/singlet_only_summary.csv”:
## # A tibble: 12 x 3
## VWsample n running_total
## <chr> <int> <int>
## 1 E01 18329 18329
## 2 E02 6054 24383
## 3 E03 17680 42063
## 4 E04 14258 56321
## 5 E05 7353 63674
## 6 E06 9216 72890
## 7 E07 2108 74998
## 8 E09 6675 81673
## 9 E10 2428 84101
## 10 E11 9491 93592
## 11 E12 4262 97854
## 12 E13 7400 105254
QC was performed to remove cells with a low number of features or genes, and a high percentage of mitochondrial RNA. Cutoffs were determined using the defaults for the “isOutlier” function in scater which is 3 MADs in the bad direction. Each calculation was performed on individual 10X runs (e.g. “cyan1”).
Distibution of features prior to thresholding:
channels_nfeature_rna_cds5
Feature thresholds applied:
qc_thresholds_features
## # A tibble: 10 x 3
## channel lower higher
## <chr> <dbl> <dbl>
## 1 cyan1 105. Inf
## 2 cyan2 93.6 Inf
## 3 blue1 129. Inf
## 4 blue2 175. Inf
## 5 green1 224. Inf
## 6 green2 256. Inf
## 7 red1 119. Inf
## 8 red2 117. Inf
## 9 yellow1 156. Inf
## 10 yellow2 198. Inf
Distribution of features after thresholding:
channels_nfeature_rna_cds6
Distribution of mitochondrial RNA percentage prior to thresholding:
channels_pct_counts_mito_cds5
qc_thresholds_mito
## # A tibble: 10 x 3
## channel lower higher
## <chr> <dbl> <dbl>
## 1 cyan1 -Inf 17.2
## 2 cyan2 -Inf 16.4
## 3 blue1 -Inf 17.2
## 4 blue2 -Inf 16.6
## 5 green1 -Inf 12.3
## 6 green2 -Inf 12.9
## 7 red1 -Inf 12.1
## 8 red2 -Inf 12.6
## 9 yellow1 -Inf 12.6
## 10 yellow2 -Inf 11.3
Distribution of mitochondrial RNA percentage after thresholding:
channels_pct_counts_mito_cds6
These show that the cells were good quality basically across the board. I broke it down by patient and the results were very similar. I can send those plots if you are interested.
The following is a table of cds cell numbers at all steps of preprocessing:
cds_cell_numbers
## # A tibble: 9 x 3
## cds_vec description_vec nrow_vec
## <chr> <chr> <int>
## 1 cds0 made from combine_cds 175667
## 2 cds1 added metadata columns 175667
## 3 cds2 filtered out cells without freemuxlet data 175403
## 4 cds2.1 filtered for singlets in doubletCalls 105363
## 5 cds2.2 filtered out ambiguous cells from cyan 1 and cyan 2 105254
## 6 cds3 calculated QC metrics 105254
## 7 cds4 added metadata columns 105254
## 8 cds5 removed metadata columns 105254
## 9 cds6 removed low quality cells 103415
So in summary, there were many cells filtered out by freemuxlet and doublet finder. About 1% of cells failed qc.
CSVs with these data can be found in the freemuxlet_data_out and qc_data_out folders in the repository.
Using standard PCA calculations and UMAP dimension reduction we get something very similar to what we had before:
cds_previz
Performing batch correction gives us this:
cds_aligned_previz
Basically the T cells look almost the same, so the major effects are to reduce the variability in the myelod blast cells. I don’t think this is what we want to do so I proceeded with the non-batch-corrected data.
As described in the overview, clustering was performed using Leiden and Paritioning methods with default Monocle settings. Top Markers are available in “data_out/partition_markers.csv”. They are almost identical as before. The only thing was that the large cluster of blasts from patient 1 got put in their own cluster this time. This really doesn’t affect our results.
Here are selected top markers. I think something like this would look good for the paper:
marker_gene_dotplot
I think this can take the place of the single UMAP scatter plots for the most part.
Here are the UMAP plots with partition assignments overlaid and colors according to response:
all_cells_by_response
Here are all cells colored by patient:
all_cells_by_pt
Both are basically identical to what we had before.
I think it would be worthwhile to specifically highlight AML blast genes like this:
aml_blast_markers